Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition

نویسندگان

چکیده

Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED with global attentions are not capable of online inference, various attention schemes been developed to reduce ASR latency for better user experience. However, a common limitation conventional softmax-based approaches is that they introduce an additional hyperparameter related length window, requiring multiple trials model training tuning hyperparameter. In order deal this problem, we propose novel softmax-free method and its modified formulation attention, which does need any at phase. Through number experiments, demonstrate tradeoff between proposed technique can be controlled by merely adjusting threshold test Furthermore, methods showed competitive terms word-error-rates (WERs).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition

Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary ...

متن کامل

A Recurrent Encoder-Decoder Network for Sequential Face Alignment

We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to ena...

متن کامل

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference

The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixedlength vector with neural networks and the quality of the representation is tested with a natural language inference task. This paper describes our system (alpha) that is ranked among the top in the Shared Task, on both the in-domain test ...

متن کامل

Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

With the rising number of interconnected devices and sensors, modeling distributed sensor networks is of increasing interest. Recurrent neural networks (RNN) are considered particularly well suited for modeling sensory and streaming data. When predicting future behavior, incorporating information from neighboring sensor stations is often beneficial. We propose a new RNN based architecture for c...

متن کامل

A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis

Current approaches to statistical parametric speech synthesis using Neural Networks generally require input at the same temporal resolution as the output, typically a frame every 5ms, or in some cases at waveform sampling rate. It is therefore necessary to fabricate highly-redundant frame-level (or samplelevel) linguistic features at the input. This paper proposes the use of a hierarchical enco...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3049344